Skip to content

[cpu][bench] Add CPU paged attention benchmarks#31720

Merged
bigPYJ1151 merged 2 commits intovllm-project:mainfrom
fadara01:cpu_attn_benchmark
Jan 6, 2026
Merged

[cpu][bench] Add CPU paged attention benchmarks#31720
bigPYJ1151 merged 2 commits intovllm-project:mainfrom
fadara01:cpu_attn_benchmark

Conversation

@fadara01
Copy link
Copy Markdown
Contributor

@fadara01 fadara01 commented Jan 5, 2026

Purpose

Add CPU paged attention benchmarks
Fixes: #30374

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • [ Y] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the performance Performance-related issues label Jan 5, 2026
@mergify mergify bot added the cpu Related to CPU backends label Jan 5, 2026
@fadara01
Copy link
Copy Markdown
Contributor Author

fadara01 commented Jan 5, 2026

@bigPYJ1151 could you please review?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a benchmark script for CPU paged attention, which is a valuable addition for performance testing and optimization. The script is well-structured and provides a good range of configurable parameters. My review focuses on improving the robustness of the main benchmark function to prevent potential runtime errors if it's used in different contexts.

Comment thread benchmarks/kernels/cpu/benchmark_cpu_attn.py Outdated
Fixes: vllm-project#30374

Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Copy link
Copy Markdown
Member

@bigPYJ1151 bigPYJ1151 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bigPYJ1151 bigPYJ1151 enabled auto-merge (squash) January 6, 2026 08:22
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026
@bigPYJ1151 bigPYJ1151 merged commit 799b572 into vllm-project:main Jan 6, 2026
17 checks passed
LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpu Related to CPU backends performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][CPU Backend]: Add Paged Attention Benchmarks for CPU backend

2 participants